Homework 5 - Documenting an HR database

As a highly prized and chased database wizard, you’ve been hired at an exorbitant hourly rate asked to document an HR database of a fortune 1_000_000 company.

You’ve been given credentials to the database. (See course slides and/or lecture video!)

Your tasks are to submit a report that contains:

  1. A general written description of contents of database,
  2. A Chen diagram highlighting the major entities and relations,
  3. A Crows-foot diagram highlighting the entities, attributes, and the cardinality and participation of the relations,
  4. A python-generated listing of SQL tables and fields, and
  5. A listing of sample data showing the first bunch (5 or 10 or so) of EACH table.
  6. All this will be documented in the report and the HTML report will be uploaded to canvas.

Github repo: https://github.com/cmsc-vcu/cmsc408-fa2025-hw5-CaptnKristmas

Database design

The following sections document the general design of the HR database.

General description

This database houses the HR information for a company. It stores and organizes informaiton about: employees, jobs, departments, locations, and previous employment. Some of the things it tracks includes: salaries of emplyees, their previous work history, who works in what department, their positions and the locations of departments.

The HR database contains a total of 9 tables 1 of which being a View and the other 8 Base Tables. These include: countries, departments, emp_details_view, employees, job_grades, job_history, jobs, locations and regions. Each holds parts of the information and are connected via relationships showing how that information is connected in different ways. Through the following Chen and Crow’s Foot diagrams one will be able to see how these relationships work.

Chen diagram

The following diagram shows the relationships between tables in the database using a chen diagram.

ER REGIONS REGIONS region_id region_id (PK) REGIONS--region_id region_name region_name REGIONS--region_name HAS_REGION HAS_REGION REGIONS--HAS_REGION COUNTRIES COUNTRIES country_id0 country_id (PK) COUNTRIES--country_id0 country_name country_name COUNTRIES--country_name region_fk region_id (FK) COUNTRIES--region_fk LOCATED_IN_COUNTRY LOCATED_IN_COUNTRY COUNTRIES--LOCATED_IN_COUNTRY LOCATIONS LOCATIONS country_id1 country_id (FK) LOCATIONS--country_id1 location_id0 location_id (PK) LOCATIONS--location_id0 street_address street_address LOCATIONS--street_address postal_code postal_code LOCATIONS--postal_code city city LOCATIONS--city state_province state_province LOCATIONS--state_province LOCATED_AT LOCATED_AT LOCATIONS--LOCATED_AT DEPARTMENTS DEPARTMENTS manager_id0 manager_id (FK) DEPARTMENTS--manager_id0 location_id1 location_id (FK) DEPARTMENTS--location_id1 department_id0 department_id (PK) DEPARTMENTS--department_id0 department_name department_name DEPARTMENTS--department_name EMPLOYS EMPLOYS DEPARTMENTS--EMPLOYS RECORDED_IN RECORDED_IN DEPARTMENTS--RECORDED_IN EMPLOYEES EMPLOYEES job_id2 job_id (FK) EMPLOYEES--job_id2 manager_id1 manager_id (FK) EMPLOYEES--manager_id1 department_id1 department_id (FK) EMPLOYEES--department_id1 employee_id0 employee_id (PK) EMPLOYEES--employee_id0 first_name first_name EMPLOYEES--first_name last_name last_name EMPLOYEES--last_name email email (UNI) EMPLOYEES--email phone_number phone_number EMPLOYEES--phone_number hire_date hire_date EMPLOYEES--hire_date salary salary EMPLOYEES--salary commission_pct commission_pct EMPLOYEES--commission_pct REPORTS_TO REPORTS_TO EMPLOYEES--REPORTS_TO EMPLOYEES--RECORDED_IN BELONGS_TO_GRADE BELONGS_TO_GRADE EMPLOYEES--BELONGS_TO_GRADE by salary range JOBS JOBS job_id0 job_id (PK) JOBS--job_id0 job_title job_title JOBS--job_title min_salary min_salary JOBS--min_salary max_salary max_salary JOBS--max_salary HAS_JOB HAS_JOB JOBS--HAS_JOB JOBS--RECORDED_IN JOB_HISTORY JOB_HISTORY job_id1 job_id (FK) JOB_HISTORY--job_id1 department_id2 department_id (FK) JOB_HISTORY--department_id2 employee_id1 employee_id (PK) JOB_HISTORY--employee_id1 start_date start_date (PK) JOB_HISTORY--start_date end_date end_date (PK) JOB_HISTORY--end_date JOB_GRADES JOB_GRADES grade_level grade_level (PK) JOB_GRADES--grade_level lowest_sal lowest_sal JOB_GRADES--lowest_sal highest_sal highest_sal JOB_GRADES--highest_sal HAS_REGION--COUNTRIES 1,N LOCATED_IN_COUNTRY--LOCATIONS 1,N LOCATED_AT--DEPARTMENTS 1,N EMPLOYS--EMPLOYEES 1,N HAS_JOB--EMPLOYEES 1,N REPORTS_TO--EMPLOYEES 1,N RECORDED_IN--JOB_HISTORY 1,N RECORDED_IN--JOB_HISTORY 1,N RECORDED_IN--JOB_HISTORY 0,N BELONGS_TO_GRADE--JOB_GRADES derived
Figure 1: HR Database — Chen ER Diagram (Complete with Job Grades)

Crows foot diagram

The following diagram shows the relationships between tables as a crows foot diagram.

  erDiagram
  REGIONS {
    int region_id PK
    varchar(25) region_name
  }

  COUNTRIES {
    char(2) country_id PK
    varchar(40) country_name
    int region_id FK
  }

  LOCATIONS {
    int location_id PK
    varchar(40) street_address
    varchar(12) postal_code
    varchar(30) city
    varchar(25) state_province
    char(2) country_id FK
  }

  DEPARTMENTS {
    int department_id PK
    varchar(30) department_name
    int manager_id FK
    int location_id FK
  }

  JOBS {
    varchar(10) job_id PK
    varchar(35) job_title
    float min_salary
    float max_salary
  }

  EMPLOYEES {
    int employee_id PK
    varchar(20) first_name
    varchar(25) last_name
    varchar(25) email UK
    varchar(20) phone_number
    date hire_date
    varchar(10) job_id FK
    decimal salary
    decimal commission_pct
    int manager_id FK
    int department_id FK
  }

  JOB_HISTORY {
    int employee_id PK
    date start_date PK
    date end_date PK
    varchar(10) job_id FK
    int department_id FK
  }

  JOB_GRADES {
  varchar(3) grade_level PK
  float lowest_sal
  float highest_sal
  }

  REGIONS ||--o{ COUNTRIES : "has"
  COUNTRIES ||--o{ LOCATIONS : "has"
  LOCATIONS ||--o{ DEPARTMENTS : "hosts"
  DEPARTMENTS ||--o{ EMPLOYEES : "employs"
  JOBS ||--o{ EMPLOYEES : "assigned"
  EMPLOYEES ||--o{ JOB_HISTORY : "has"
  JOBS ||--o{ JOB_HISTORY : "for_job"
  DEPARTMENTS o|--o{ JOB_HISTORY : "worked_in"
  EMPLOYEES ||--o{ EMPLOYEES : "manages"
  JOB_GRADES ||--o{ EMPLOYEES : "grade"

Listing of tables in the database

The python blocks that follow show how to login and pull from a database in order to create an html that displays information from said database. The python code blocks are what is needed to be used in order to achieve such a task.

Create connection

This code is loading credentials from the environment file (.env) in order to log into the database to pull further information.

from itables import show
from helpers import create_database_engine, run_sql_and_return_df, run_sql_and_return_html, create_db_wrapper

# Load these variables from .env file.
config_map = {
  'user'    : "MYSQL_USERNAME",
  'password': "MYSQL_PASSWORD",
  'host'    : "MYSQL_HOST",
  'database': "HW5_DB"
}

cnx,config = create_db_wrapper( config_map )

List of tables found in HR database

This querries the Information_schema in MySQL in order to return a list of the tables in the database.

df = run_sql_and_return_df(cnx,f"""
SELECT 
  TABLE_NAME, TABLE_TYPE
FROM
  INFORMATION_SCHEMA.TABLES 
WHERE
  TABLE_SCHEMA = '{config['database']}'
ORDER BY
  TABLE_NAME
""",cnx)
show(df)

cnx,config = create_db_wrapper( config_map ) 
Loading ITables v2.5.2 from the internet... (need help?)

Listing of Tables and sample data

The following shows what data is stored on the database along with explanations of the various tables. It should help one understand how the data relates to each other.

Table: countries

The countries table contains data such as the type, the type of key and the field. It contains data relating to countries and regions.

Schema: countries

# Table schema
df = run_sql_and_return_df(cnx,"""
DESCRIBE countries
""")
show(df)
Loading ITables v2.5.2 from the internet... (need help?)

Sample data: countries

df = run_sql_and_return_df(cnx,"""
SELECT * FROM countries LIMIT 10""")
show(df)
Loading ITables v2.5.2 from the internet... (need help?)

Table: departments

The departments table contains data such as the field, type, and type of key. It contains data relating to departments their locations and managers.

Schema: departments

# Table schema
df = run_sql_and_return_df(cnx,"""
DESCRIBE departments
""")
show(df)
Loading ITables v2.5.2 from the internet... (need help?)

Sample data: departments

df = run_sql_and_return_df(cnx,"""
SELECT * FROM departments LIMIT 10""")
show(df)
Loading ITables v2.5.2 from the internet... (need help?)

Table: emp_details_view

The employee details view table contains data such as the field and type. It contains data relating to employees, their departments, locations, managers and jobs.

Schema: emp_details_view

# Table schema
df = run_sql_and_return_df(cnx,"""
DESCRIBE emp_details_view
""")
show(df)
Loading ITables v2.5.2 from the internet... (need help?)

Sample data: emp_details_view

df = run_sql_and_return_df(cnx,"""
SELECT * FROM emp_details_view LIMIT 10""")
show(df)
Loading ITables v2.5.2 from the internet... (need help?)

Table: employees

The employees table contains data such as the field, type and key. It contains data relating to employees, their departments, locations, managers and jobs. This expands on the previous table by adding information such as employees emails and phone numbers along with their names, hire dates, salaries and commision percentage.

Schema: employees

# Table schema
df = run_sql_and_return_df(cnx,"""
DESCRIBE employees
""")
show(df)
Loading ITables v2.5.2 from the internet... (need help?)

Sample data: employees

df = run_sql_and_return_df(cnx,"""
SELECT * FROM employees LIMIT 10""")
show(df)
Loading ITables v2.5.2 from the internet... (need help?)

Table: job_grades

The job_grades table contains data such as the field, type and the key. This data relates to the highest and lowest salaries for a jobs grade.

Schema: job_grades

# Table schema
df = run_sql_and_return_df(cnx,"""
DESCRIBE job_grades
""")
show(df)
Loading ITables v2.5.2 from the internet... (need help?)

Sample data: job_grades

df = run_sql_and_return_df(cnx,"""
SELECT * FROM job_grades LIMIT 10""")
show(df)
Loading ITables v2.5.2 from the internet... (need help?)

Table: job_history

The job_history table contains data such as the field, type and the key. This data relates employees to their start and end dates at the company.

Schema: job_history

# Table schema
df = run_sql_and_return_df(cnx,"""
DESCRIBE job_history
""")
show(df)
Loading ITables v2.5.2 from the internet... (need help?)

Sample data: job_history

df = run_sql_and_return_df(cnx,"""
SELECT * FROM job_history LIMIT 10""")
show(df)
Loading ITables v2.5.2 from the internet... (need help?)

Table: jobs

The jobs table contains data such as field type and key. This data relates jobs to their max and minimum salaries.

Schema: jobs

# Table schema
df = run_sql_and_return_df(cnx,"""
DESCRIBE jobs
""")
show(df)
Loading ITables v2.5.2 from the internet... (need help?)

Sample data: jobs

df = run_sql_and_return_df(cnx,"""
SELECT * FROM jobs LIMIT 10""")
show(df)
Loading ITables v2.5.2 from the internet... (need help?)

Table: locations

The locations table contains data such as the field, type and key. This data relates the street addresses on file to their respective cities, postal codes, states/provinces and countries.

Schema: locations

# Table schema
df = run_sql_and_return_df(cnx,"""
DESCRIBE locations
""")
show(df)
Loading ITables v2.5.2 from the internet... (need help?)

Sample data: locations

df = run_sql_and_return_df(cnx,"""
SELECT * FROM locations LIMIT 10""")
show(df)
Loading ITables v2.5.2 from the internet... (need help?)

Table: regions

The regions table contains data such as field, type and key. This data relates to a list of regions.

Schema: regions

# Table schema
df = run_sql_and_return_df(cnx,"""
DESCRIBE regions
""")
show(df)
Loading ITables v2.5.2 from the internet... (need help?)

Sample data: regions

df = run_sql_and_return_df(cnx,"""
SELECT * FROM regions LIMIT 10""")
show(df)
Loading ITables v2.5.2 from the internet... (need help?)

Reflection

Please add a paragraph or two about how this project went.
I discuss this further in the section asking about how the information provided was. I encountered major difficulties during set-up that took multiple full days to resolve. Unfortunately, this was by far the hardest set-up I have ever encountered. I once had issues setting up Windows Side Loader but that still was miles smoother than this experience.

I did not encounter any massive issues while working on the project itself once everything was downloaded beyond some syntax errors that took a while to correct. I also encountered a weird issue where I had to uninstall itables, the pip cache and jupyter. Once reinstalling the issue was resolved.

Was it harder or easier than you expected?
This ended up being easier then expected once everything was installed and working but up to that point it was incredibly difficult.
Did the instructor provide too much information or not enough information?
I encountered issues while following the directions and ended up having to create work around in order to complete the process. Due to these work arounds I had issues having the libraries pre working on this poetry shell. As such I had to download every library the project uses. Additionally, despite the base python 3.11.6 having Jupyter it did not extend to the poetry shell. I was unable to see an environment for the shell in the terminal and had to create an environement specifically in the folder of the repo. This likely caused the issues but unfortunately I was unable to do this the way the directions instructed.

One major recommendation I have is to double check previous Python installs. I somehow had multiple of the same python version installed and had to go through and clean that out. This process along with properly setting up shims and the bin folder for pyenv caused the issues that ultimately led to the poetry workaround.

Poetry required an uninstall which after was not working due to the system seeing it as installed despite it not appear anywhere. This led to workaround that caused poetry env info to show no environement. Ultimately ending up at the solution I mentioned.

Do you have suggestions for how it can be improved?
Better clarification for error solving and possibly a warning to check certain things prior to starting. In addition to the small uninstalls it asks you to make to double check.

README

Homework 5 — Documenting an HR Database

Author: Vladimir Paraschiv
Course: CMSC 408 – Database Systems
Semester: Fall 2025
Date: October 17, 2025
Repository


📘 Overview

This project documents a Human Resources (HR) database for a large enterprise system.

The assignment focuses on exploring an existing relational schema, understanding how each table contributes to the organization’s HR functions, and visualizing relationships between entities using Chen and Crow’s Foot notations.

The report integrates SQL-generated metadata, sample data, and ER diagrams, offering both a conceptual and physical view of the database.

The deliverable demonstrates proficiency in connecting to a live database, querying the SQL Database, and displaying formatted tables directly within a Quarto HTML report.


🧩 Tools and Technologies

This project uses:

  • Quarto — for professional report generation combining text, SQL, and visuals
  • Graphviz (DOT) — to produce a Chen-style ER diagram
  • Mermaid — to produce a Crow’s Foot diagram with cardinalities
  • Python (3.11.6) — for database connection and querying
  • itables — for rendering interactive HTML tables
  • PyMySQL / SQLAlchemy / dotenv — for MySQL access and environment-variable handling
  • Poetry — to manage dependencies and virtual environments
  • Pyenv — to create isolated Python environments

🧠 Database Summary

The HR database consists of nine tables (8 base tables + 1 view) that collectively describe a company’s workforce, structure, and geography:

  • regions, countries, locations, departments, employees, jobs, job_history, job_grades, and emp_details_view

These tables capture hierarchical and referential relationships such as: - Region → Country → Location → Department → Employee
- Jobs and Job Grades defining employee salary ranges and titles
- Historical data for past positions and department changes

The diagrams and table listings reveal how foreign-key relationships enforce integrity across this structure.


🧪 Rendering Instructions

To render the Quarto report into a standalone HTML document, run in Virtual Studio Code terminal: cd ./reports and then: quarto render report.qmd

Alternatively, simply open the pre-rendered report.html file included in this repository.